I got about 100k running/walking routes stored in mongodb that I need to group by similarity. By similar I mean that each point between two routes are within a distance thats no bigger than a given threshold. The points are tuples of lat, lng.

Getting the data

I generated a dump from mongodb and then exported what i needed into a csv file. I'm using a smaller set for testing purposes, with only 1k routes.

In [124]:
!head -n 3 routes1000.csv




I just need an id and the route's points. The points are formated as linestrings: lng0 lat0, lng1 lat1, ... Let's load these and do some work:

In [125]:
import sys
import csv
from collections import namedtuple


csv.field_size_limit(sys.maxsize)

Route = namedtuple("Route", ['id', 'points'])


def load(dataset):
    reader = csv.reader(open(dataset), delimiter='\t')
    routes = []
    for (_id, route) in reader:
        route = Route(_id, tuple( tuple(float(p) for p in point.split() ) 
                                           for point in route.split(',')))
        routes.append(route)
    return routes

routes = load('routes1000.csv')
#take out one point routes
routes = [r for r in routes if len(r.points) > 1]
routes[0]


Out[125]:
Route(id='509425c021ce593a88dc99a8', points=((-43.20368, -22.96509), (-43.20371, -22.9651), (-43.20355, -22.96636), (-43.20358, -22.96679), (-43.2037, -22.96721), (-43.20398, -22.96761), (-43.20637, -22.97005), (-43.20643, -22.96999), (-43.20637, -22.97005), (-43.20782, -22.97148), (-43.20826, -22.97198), (-43.20835, -22.97215), (-43.20837, -22.97229), (-43.20835, -22.97247), (-43.20815, -22.97304), (-43.20789, -22.97369), (-43.20777, -22.97361), (-43.20751, -22.97422), (-43.20743, -22.97431), (-43.20734, -22.97436), (-43.20716, -22.97441), (-43.20704, -22.97441), (-43.20676, -22.97427), (-43.20346, -22.97248), (-43.20317, -22.9724), (-43.20299, -22.97241), (-43.20274, -22.97249), (-43.20245, -22.9727), (-43.20148, -22.97366), (-43.20145, -22.97363), (-43.20148, -22.97366), (-43.20103, -22.97409), (-43.20084, -22.97418), (-43.20059, -22.97424), (-43.20056, -22.97433), (-43.20057, -22.97444), (-43.20061, -22.97451), (-43.20069, -22.97456), (-43.20078, -22.97457), (-43.20088, -22.97454), (-43.20036, -22.97521), (-43.19951, -22.97643), (-43.19922, -22.97692), (-43.19912, -22.97736), (-43.19921, -22.97778), (-43.19939, -22.97813), (-43.19963, -22.97845), (-43.20019, -22.97867), (-43.20069, -22.97878), (-43.20655, -22.98046), (-43.20711, -22.98055), (-43.20773, -22.98059), (-43.20816, -22.98058), (-43.20833, -22.98057), (-43.20832, -22.9805), (-43.20833, -22.98057), (-43.20898, -22.98051), (-43.21289, -22.98033), (-43.21452, -22.9803), (-43.2152, -22.98005), (-43.21691, -22.9792), (-43.2177, -22.97886), (-43.21803, -22.97843), (-43.21876, -22.97711), (-43.21883, -22.97695), (-43.21883, -22.97687), (-43.21733, -22.96817), (-43.21347, -22.96367)))
As lat's and lng's are just x's and y's, let's scatter plot these to have an idea of what they look like.

In [126]:
%matplotlib inline
import pylab
from itertools import cycle


def plot(series):
    colors = cycle(['r', 'g', 'b', 'c', 'm', 'y', 'k'])
    fig = pylab.plt.figure()
    ax = fig.add_subplot(111)
    ax.figure.set_size_inches(16, 12)
    for s in series:
        ax.scatter(*zip(*s), marker='D', alpha=0.5, color=next(colors))

plot(route.points for route in routes[:2])


As you can see, those routes are pretty different and far away from each other. Hopefuly we will be able to detect that without having to resort to hiring people to look/compare to each route. That would suck.

Which takes us to the main point of this: detecting similar routes. From my research I found lots of sugestions on how to aproach this problem. I'm no GIS wizard and mongodb doesnt really have much in this area that can help me with this problem(or maybe it does? apart from *near queries, I dont really see much). So I ended up going with the well known Hausdorff distance algorithm.

A direct translation of the algori to python of the pseudocode presented on this very helpful page is given bellow. It does not account for forward nor backwards distances but that change is trivial.


In [127]:
from math import sqrt, pow

def euclidean(x, y):
    """Calculates the euclidean distance between points x and y"""
    return sqrt(pow(x[0] - y[0], 2) + pow(x[1] - y[1], 2))

def hausdorff0(A, B, dist=euclidean):
    """Maximum distance of a set to the nearest point in the other set
    Rougth pseudocode to python
    http://cgm.cs.mcgill.ca/~godfried/teaching/cg-projects/98/normand/main.html
    """
    h = 0
    for a in A:
        shortest = None
        for b in B:
            distance = dist(a, b)
            if distance < shortest:
                shortest = distance
        if shortest > h:
            h = shortest
    return h
But if you take another look at Hausdorff's formula, you see that you can make an identical translation to python using a more of a functional idiom. I think the result is cleaner, so that's what I'm going to use, but that's just my taste. Also, in this new version, we're accounting for the distance between the two objects instead of the distance from A to B that is given in the first version. I also use a simple decorator to normalize the output of the function to be a value between 0 and 1.

In [128]:
from functools import wraps

def normalize(fn):
    """Normalizes the output of a function to a value between 0..1"""
    @wraps(fn)
    def wrapper(*args, **kwargs):
        return 1/(1+sqrt(fn(*args, **kwargs)))
    return wrapper


@normalize
def hausdorff(A, B, dist=euclidean):
    def _hausdorff(A, B):
        return max(min(dist(a, b) for b in B) for a in A)
    return max((_hausdorff(A, B), _hausdorff(B, A)))
Next, we cluster the routes using the hausdorff distance. I havent lost much time optimizing this because I probably wont have to run this frequently. But still, the first time I ran this on all of the data it just wouldnt finish. That's mostly because the first version was really lazy and did unneded comparisons between routes(I just wanted to check if if worked (:). Also, it did not use all of my processing power, just one cpu. Threads wouldnt really be helpfull in here, as this is python(GIL) and this is a cpu bound task. I have to use processes and the multiprocessing module makes this a breeze.

In [129]:
from itertools import combinations
from multiprocessing import Pool
from itertools import chain

def chunks(l, n):
    """Divides a iterable into a list of n-sized chunks
    """
    for i in xrange(0, len(l), n):
        yield l[i:i+n]


def flatten(lol):
    """Flattens a list of lists into a list
    """
    return chain.from_iterable(lol)


def compare_routes(routes, dist=hausdorff, threshold=0.9):
    """Returns routes which have a similarity score higher than a given threshold
    """
    route_combinations = combinations(routes, 2)
    return [(r1, r2) for r1, r2 in route_combinations if dist(r1.points, r2.points) >= threshold]


pool = Pool(8) # spawn 8 processes

similar_routes = pool.map(compare_routes, chunks(routes, 10))
pool.close()
pool.join()

similar_routes[0]


Out[129]:
[(Route(id='509425c021ce593a88dc99a8', points=((-43.20368, -22.96509), (-43.20371, -22.9651), (-43.20355, -22.96636), (-43.20358, -22.96679), (-43.2037, -22.96721), (-43.20398, -22.96761), (-43.20637, -22.97005), (-43.20643, -22.96999), (-43.20637, -22.97005), (-43.20782, -22.97148), (-43.20826, -22.97198), (-43.20835, -22.97215), (-43.20837, -22.97229), (-43.20835, -22.97247), (-43.20815, -22.97304), (-43.20789, -22.97369), (-43.20777, -22.97361), (-43.20751, -22.97422), (-43.20743, -22.97431), (-43.20734, -22.97436), (-43.20716, -22.97441), (-43.20704, -22.97441), (-43.20676, -22.97427), (-43.20346, -22.97248), (-43.20317, -22.9724), (-43.20299, -22.97241), (-43.20274, -22.97249), (-43.20245, -22.9727), (-43.20148, -22.97366), (-43.20145, -22.97363), (-43.20148, -22.97366), (-43.20103, -22.97409), (-43.20084, -22.97418), (-43.20059, -22.97424), (-43.20056, -22.97433), (-43.20057, -22.97444), (-43.20061, -22.97451), (-43.20069, -22.97456), (-43.20078, -22.97457), (-43.20088, -22.97454), (-43.20036, -22.97521), (-43.19951, -22.97643), (-43.19922, -22.97692), (-43.19912, -22.97736), (-43.19921, -22.97778), (-43.19939, -22.97813), (-43.19963, -22.97845), (-43.20019, -22.97867), (-43.20069, -22.97878), (-43.20655, -22.98046), (-43.20711, -22.98055), (-43.20773, -22.98059), (-43.20816, -22.98058), (-43.20833, -22.98057), (-43.20832, -22.9805), (-43.20833, -22.98057), (-43.20898, -22.98051), (-43.21289, -22.98033), (-43.21452, -22.9803), (-43.2152, -22.98005), (-43.21691, -22.9792), (-43.2177, -22.97886), (-43.21803, -22.97843), (-43.21876, -22.97711), (-43.21883, -22.97695), (-43.21883, -22.97687), (-43.21733, -22.96817), (-43.21347, -22.96367))),
  Route(id='50947768e794b11cf7e82ebe', points=((-43.21717, -22.96728), (-43.21717, -22.96729), (-43.21717, -22.96729), (-43.21717, -22.96728), (-43.21717, -22.96729), (-43.21717, -22.96728), (-43.21717, -22.96729), (-43.21715, -22.9673), (-43.217, -22.96699), (-43.21593, -22.96531), (-43.21466, -22.96428), (-43.21338, -22.96365), (-43.21141, -22.96285), (-43.21101, -22.96277), (-43.20965, -22.96295), (-43.20899, -22.96312), (-43.20679, -22.96287), (-43.2056, -22.96275), (-43.20496, -22.96275), (-43.20465, -22.96295), (-43.20399, -22.96361), (-43.20387, -22.96401), (-43.20364, -22.96574), (-43.20353, -22.96644), (-43.20357, -22.96687), (-43.20372, -22.96721), (-43.2037, -22.96721), (-43.20398, -22.96761), (-43.20487, -22.96853), (-43.20515, -22.9688), (-43.20515, -22.9688), (-43.20515, -22.9688), (-43.20644, -22.97013), (-43.20647, -22.97011), (-43.20644, -22.97013), (-43.2075, -22.97116), (-43.20782, -22.97148), (-43.20826, -22.97198), (-43.20831, -22.97208), (-43.20828, -22.97209), (-43.20831, -22.97208), (-43.20835, -22.97215), (-43.20837, -22.97229), (-43.20835, -22.97247), (-43.20815, -22.97304), (-43.20776, -22.974), (-43.20774, -22.97399), (-43.20776, -22.974), (-43.20765, -22.97428), (-43.20758, -22.97437), (-43.20743, -22.97449), (-43.2073, -22.97452), (-43.20715, -22.97453), (-43.20715, -22.97452), (-43.20715, -22.97453), (-43.20708, -22.97453), (-43.20692, -22.97447), (-43.20529, -22.97362), (-43.20488, -22.97343), (-43.20487, -22.97345), (-43.20488, -22.97343), (-43.20407, -22.97302), (-43.20375, -22.97297), (-43.20349, -22.97297), (-43.20315, -22.97301), (-43.20291, -22.97305), (-43.2026, -22.97314), (-43.2021, -22.97338), (-43.2021, -22.97339), (-43.2021, -22.97338), (-43.20176, -22.97363), (-43.20088, -22.97454), (-43.20067, -22.97481), (-43.20071, -22.97483), (-43.20067, -22.97481), (-43.20036, -22.97521), (-43.19993, -22.97584), (-43.19996, -22.97586), (-43.19993, -22.97584), (-43.19951, -22.97643), (-43.19922, -22.97692), (-43.19912, -22.97736), (-43.19914, -22.97749), (-43.19921, -22.97778), (-43.19939, -22.97813), (-43.19963, -22.97845), (-43.20019, -22.97867), (-43.20069, -22.97878), (-43.20337, -22.97955), (-43.20507, -22.98004), (-43.20509, -22.97999), (-43.20507, -22.98004), (-43.20569, -22.98022), (-43.20655, -22.98046), (-43.20711, -22.98055), (-43.20773, -22.98059), (-43.20816, -22.98058), (-43.20898, -22.98051), (-43.21244, -22.98034), (-43.21273, -22.98033), (-43.21273, -22.98026), (-43.21273, -22.98033), (-43.21349, -22.98032), (-43.21405, -22.9803), (-43.21431, -22.9803), (-43.21452, -22.9803), (-43.21473, -22.98022), (-43.2152, -22.98005), (-43.21606, -22.97961), (-43.2165, -22.9794), (-43.21733, -22.97902), (-43.2177, -22.97886), (-43.21786, -22.97866), (-43.21803, -22.97843), (-43.21812, -22.97826), (-43.21849, -22.97758), (-43.2188, -22.97701), (-43.21883, -22.97695), (-43.21884, -22.97679), (-43.21871, -22.97679), (-43.21882, -22.97641), (-43.2185, -22.9755), (-43.21778, -22.97349), (-43.21741, -22.97179), (-43.21733, -22.96971), (-43.2174, -22.96784), (-43.21719, -22.96736), (-43.21725, -22.96735), (-43.21741, -22.96772), (-43.21737, -22.96954), (-43.21744, -22.9714), (-43.21763, -22.97252), (-43.21809, -22.9746), (-43.21877, -22.97607), (-43.21887, -22.97644), (-43.21881, -22.97668), (-43.21871, -22.9771), (-43.21809, -22.97826), (-43.21767, -22.9788), (-43.21738, -22.97893), (-43.21608, -22.97954), (-43.21469, -22.98018), (-43.21451, -22.98025), (-43.21123, -22.98035), (-43.2083, -22.98053), (-43.20743, -22.98053), (-43.20683, -22.98054), (-43.2043, -22.97978), (-43.2003, -22.97865), (-43.19966, -22.97842), (-43.19909, -22.9774), (-43.19929, -22.97685), (-43.19981, -22.97609), (-43.20075, -22.97476), (-43.20126, -22.97408), (-43.20222, -22.97327), (-43.20325, -22.97304), (-43.20368, -22.97302), (-43.20401, -22.97304), (-43.20485, -22.97347), (-43.20604, -22.97406), (-43.2072, -22.97455), (-43.2075, -22.97449), (-43.20774, -22.97415), (-43.20823, -22.97292), (-43.2084, -22.97226), (-43.20827, -22.97192), (-43.20662, -22.97026), (-43.20543, -22.96901), (-43.20443, -22.96799), (-43.20376, -22.9672), (-43.20359, -22.96646), (-43.20386, -22.96442), (-43.20401, -22.96366), (-43.20462, -22.96303), (-43.20488, -22.96285), (-43.20512, -22.96278), (-43.20545, -22.96278), (-43.20592, -22.96284), (-43.20711, -22.96296), (-43.20771, -22.96301), (-43.20868, -22.96314), (-43.20893, -22.96315), (-43.20913, -22.96315), (-43.20957, -22.96304), (-43.21012, -22.96284), (-43.21102, -22.96282), (-43.21157, -22.96294), (-43.2132, -22.96354), (-43.21399, -22.96398), (-43.21469, -22.96433), (-43.21532, -22.96477), (-43.21589, -22.9652), (-43.21632, -22.96581), (-43.21719, -22.96721)))),
 (Route(id='509425c021ce593a88dc99a8', points=((-43.20368, -22.96509), (-43.20371, -22.9651), (-43.20355, -22.96636), (-43.20358, -22.96679), (-43.2037, -22.96721), (-43.20398, -22.96761), (-43.20637, -22.97005), (-43.20643, -22.96999), (-43.20637, -22.97005), (-43.20782, -22.97148), (-43.20826, -22.97198), (-43.20835, -22.97215), (-43.20837, -22.97229), (-43.20835, -22.97247), (-43.20815, -22.97304), (-43.20789, -22.97369), (-43.20777, -22.97361), (-43.20751, -22.97422), (-43.20743, -22.97431), (-43.20734, -22.97436), (-43.20716, -22.97441), (-43.20704, -22.97441), (-43.20676, -22.97427), (-43.20346, -22.97248), (-43.20317, -22.9724), (-43.20299, -22.97241), (-43.20274, -22.97249), (-43.20245, -22.9727), (-43.20148, -22.97366), (-43.20145, -22.97363), (-43.20148, -22.97366), (-43.20103, -22.97409), (-43.20084, -22.97418), (-43.20059, -22.97424), (-43.20056, -22.97433), (-43.20057, -22.97444), (-43.20061, -22.97451), (-43.20069, -22.97456), (-43.20078, -22.97457), (-43.20088, -22.97454), (-43.20036, -22.97521), (-43.19951, -22.97643), (-43.19922, -22.97692), (-43.19912, -22.97736), (-43.19921, -22.97778), (-43.19939, -22.97813), (-43.19963, -22.97845), (-43.20019, -22.97867), (-43.20069, -22.97878), (-43.20655, -22.98046), (-43.20711, -22.98055), (-43.20773, -22.98059), (-43.20816, -22.98058), (-43.20833, -22.98057), (-43.20832, -22.9805), (-43.20833, -22.98057), (-43.20898, -22.98051), (-43.21289, -22.98033), (-43.21452, -22.9803), (-43.2152, -22.98005), (-43.21691, -22.9792), (-43.2177, -22.97886), (-43.21803, -22.97843), (-43.21876, -22.97711), (-43.21883, -22.97695), (-43.21883, -22.97687), (-43.21733, -22.96817), (-43.21347, -22.96367))),
  Route(id='509480d11a345f37f24e90c8', points=((-43.21716, -22.96728), (-43.21715, -22.9673), (-43.21699, -22.96698), (-43.21593, -22.9653), (-43.21465, -22.96428), (-43.21338, -22.96364), (-43.2114, -22.96284), (-43.211, -22.96276), (-43.20964, -22.96294), (-43.20899, -22.96311), (-43.2056, -22.96274), (-43.20495, -22.96274), (-43.20464, -22.96294), (-43.20399, -22.9636), (-43.20387, -22.96401), (-43.20353, -22.96644), (-43.20357, -22.96686), (-43.20372, -22.96721), (-43.2037, -22.96721), (-43.20398, -22.96761), (-43.20487, -22.96853), (-43.20515, -22.96879), (-43.20644, -22.97013), (-43.20647, -22.9701), (-43.20644, -22.97013), (-43.20782, -22.97148), (-43.20826, -22.97198), (-43.20831, -22.97208), (-43.20828, -22.97209), (-43.20831, -22.97208), (-43.20835, -22.97215), (-43.20837, -22.97229), (-43.20835, -22.97247), (-43.20815, -22.97304), (-43.20776, -22.974), (-43.20773, -22.97399), (-43.20776, -22.974), (-43.20765, -22.97428), (-43.20758, -22.97437), (-43.20743, -22.97449), (-43.2073, -22.97452), (-43.20715, -22.97453), (-43.20708, -22.97453), (-43.20692, -22.97447), (-43.20529, -22.97362), (-43.20488, -22.97343), (-43.20487, -22.97344), (-43.20488, -22.97343), (-43.20407, -22.97302), (-43.20375, -22.97297), (-43.20349, -22.97297), (-43.20291, -22.97305), (-43.2026, -22.97314), (-43.2021, -22.97338), (-43.20176, -22.97363), (-43.20088, -22.97454), (-43.20067, -22.97481), (-43.2007, -22.97483), (-43.20067, -22.97481), (-43.20036, -22.97521), (-43.19993, -22.97584), (-43.19995, -22.97585), (-43.19993, -22.97584), (-43.19951, -22.97643), (-43.19922, -22.97692), (-43.19912, -22.97736), (-43.19921, -22.97778), (-43.19939, -22.97813), (-43.19963, -22.97845), (-43.20019, -22.97867), (-43.20069, -22.97878), (-43.20507, -22.98004), (-43.20508, -22.97998), (-43.20507, -22.98004), (-43.20655, -22.98046), (-43.20711, -22.98055), (-43.20773, -22.98059), (-43.20816, -22.98058), (-43.20898, -22.98051), (-43.21273, -22.98033), (-43.21272, -22.98026), (-43.21273, -22.98033), (-43.21452, -22.9803), (-43.2152, -22.98005), (-43.2165, -22.9794), (-43.2177, -22.97886), (-43.21803, -22.97843), (-43.21883, -22.97695), (-43.21884, -22.97679), (-43.21871, -22.97678), (-43.21882, -22.97641), (-43.21777, -22.97348), (-43.21741, -22.97178), (-43.21732, -22.96971), (-43.2174, -22.96783), (-43.21718, -22.96736), (-43.21725, -22.96735), (-43.21741, -22.96771), (-43.21737, -22.96953), (-43.21743, -22.9714), (-43.21762, -22.97251), (-43.21809, -22.9746), (-43.21876, -22.97606), (-43.21887, -22.97644), (-43.21871, -22.9771), (-43.21809, -22.97825), (-43.21767, -22.9788), (-43.21468, -22.98018), (-43.2145, -22.98025), (-43.21123, -22.98035), (-43.20829, -22.98053), (-43.20682, -22.98054), (-43.2043, -22.97977), (-43.2003, -22.97865), (-43.19965, -22.97841), (-43.19908, -22.97739), (-43.19929, -22.97684), (-43.1998, -22.97608), (-43.20125, -22.97408), (-43.20222, -22.97327), (-43.20325, -22.97304), (-43.20368, -22.97302), (-43.20401, -22.97304), (-43.20604, -22.97406), (-43.2072, -22.97455), (-43.2075, -22.97448), (-43.20773, -22.97414), (-43.20823, -22.97292), (-43.2084, -22.97226), (-43.20827, -22.97191), (-43.20662, -22.97025), (-43.20443, -22.96798), (-43.20375, -22.96719), (-43.20358, -22.96646), (-43.20386, -22.96441), (-43.20401, -22.96365), (-43.20461, -22.96302), (-43.20488, -22.96284), (-43.20511, -22.96277), (-43.20545, -22.96277), (-43.20592, -22.96283), (-43.2077, -22.963), (-43.20868, -22.96313), (-43.20892, -22.96314), (-43.20913, -22.96314), (-43.20957, -22.96303), (-43.21011, -22.96283), (-43.21102, -22.96281), (-43.21156, -22.96293), (-43.21319, -22.96354), (-43.21399, -22.96398), (-43.21468, -22.96433), (-43.21532, -22.96476), (-43.21589, -22.96519), (-43.21632, -22.96581), (-43.21718, -22.9672)))),
 (Route(id='5094298e1a345f2ffde509f9', points=((-43.36265, -23.01066), (-43.36265, -23.01072), (-43.36099, -23.01067), (-43.35686, -23.01046), (-43.34993, -23.01029), (-43.34751, -23.01029), (-43.34444, -23.01036), (-43.34231, -23.01039), (-43.33979, -23.01046), (-43.33328, -23.01076), (-43.32929, -23.01107), (-43.32471, -23.01165), (-43.32326, -23.01186), (-43.32279, -23.01195), (-43.31886, -23.01292), (-43.31791, -23.01303), (-43.31369, -23.01402), (-43.31331, -23.01403), (-43.31268, -23.01413), (-43.30995, -23.01469), (-43.30886, -23.01489), (-43.30775, -23.015), (-43.30584, -23.01509), (-43.3032, -23.01481), (-43.29982, -23.0144), (-43.29982, -23.01437))),
  Route(id='509437981a345f2ff08fbf6a', points=((-43.36265, -23.01066), (-43.36265, -23.01072), (-43.36099, -23.01067), (-43.35686, -23.01046), (-43.34993, -23.01029), (-43.34751, -23.01029), (-43.34444, -23.01036), (-43.34231, -23.01039), (-43.33979, -23.01046), (-43.33328, -23.01076), (-43.32929, -23.01107), (-43.32471, -23.01165), (-43.32326, -23.01186), (-43.32279, -23.01195), (-43.31886, -23.01292), (-43.31791, -23.01303), (-43.31369, -23.01402), (-43.31331, -23.01403), (-43.31268, -23.01413), (-43.30995, -23.01469), (-43.30886, -23.01489), (-43.30775, -23.015), (-43.30584, -23.01509), (-43.3032, -23.01481), (-43.29982, -23.0144), (-43.29982, -23.01437)))),
 (Route(id='50947768e794b11cf7e82ebe', points=((-43.21717, -22.96728), (-43.21717, -22.96729), (-43.21717, -22.96729), (-43.21717, -22.96728), (-43.21717, -22.96729), (-43.21717, -22.96728), (-43.21717, -22.96729), (-43.21715, -22.9673), (-43.217, -22.96699), (-43.21593, -22.96531), (-43.21466, -22.96428), (-43.21338, -22.96365), (-43.21141, -22.96285), (-43.21101, -22.96277), (-43.20965, -22.96295), (-43.20899, -22.96312), (-43.20679, -22.96287), (-43.2056, -22.96275), (-43.20496, -22.96275), (-43.20465, -22.96295), (-43.20399, -22.96361), (-43.20387, -22.96401), (-43.20364, -22.96574), (-43.20353, -22.96644), (-43.20357, -22.96687), (-43.20372, -22.96721), (-43.2037, -22.96721), (-43.20398, -22.96761), (-43.20487, -22.96853), (-43.20515, -22.9688), (-43.20515, -22.9688), (-43.20515, -22.9688), (-43.20644, -22.97013), (-43.20647, -22.97011), (-43.20644, -22.97013), (-43.2075, -22.97116), (-43.20782, -22.97148), (-43.20826, -22.97198), (-43.20831, -22.97208), (-43.20828, -22.97209), (-43.20831, -22.97208), (-43.20835, -22.97215), (-43.20837, -22.97229), (-43.20835, -22.97247), (-43.20815, -22.97304), (-43.20776, -22.974), (-43.20774, -22.97399), (-43.20776, -22.974), (-43.20765, -22.97428), (-43.20758, -22.97437), (-43.20743, -22.97449), (-43.2073, -22.97452), (-43.20715, -22.97453), (-43.20715, -22.97452), (-43.20715, -22.97453), (-43.20708, -22.97453), (-43.20692, -22.97447), (-43.20529, -22.97362), (-43.20488, -22.97343), (-43.20487, -22.97345), (-43.20488, -22.97343), (-43.20407, -22.97302), (-43.20375, -22.97297), (-43.20349, -22.97297), (-43.20315, -22.97301), (-43.20291, -22.97305), (-43.2026, -22.97314), (-43.2021, -22.97338), (-43.2021, -22.97339), (-43.2021, -22.97338), (-43.20176, -22.97363), (-43.20088, -22.97454), (-43.20067, -22.97481), (-43.20071, -22.97483), (-43.20067, -22.97481), (-43.20036, -22.97521), (-43.19993, -22.97584), (-43.19996, -22.97586), (-43.19993, -22.97584), (-43.19951, -22.97643), (-43.19922, -22.97692), (-43.19912, -22.97736), (-43.19914, -22.97749), (-43.19921, -22.97778), (-43.19939, -22.97813), (-43.19963, -22.97845), (-43.20019, -22.97867), (-43.20069, -22.97878), (-43.20337, -22.97955), (-43.20507, -22.98004), (-43.20509, -22.97999), (-43.20507, -22.98004), (-43.20569, -22.98022), (-43.20655, -22.98046), (-43.20711, -22.98055), (-43.20773, -22.98059), (-43.20816, -22.98058), (-43.20898, -22.98051), (-43.21244, -22.98034), (-43.21273, -22.98033), (-43.21273, -22.98026), (-43.21273, -22.98033), (-43.21349, -22.98032), (-43.21405, -22.9803), (-43.21431, -22.9803), (-43.21452, -22.9803), (-43.21473, -22.98022), (-43.2152, -22.98005), (-43.21606, -22.97961), (-43.2165, -22.9794), (-43.21733, -22.97902), (-43.2177, -22.97886), (-43.21786, -22.97866), (-43.21803, -22.97843), (-43.21812, -22.97826), (-43.21849, -22.97758), (-43.2188, -22.97701), (-43.21883, -22.97695), (-43.21884, -22.97679), (-43.21871, -22.97679), (-43.21882, -22.97641), (-43.2185, -22.9755), (-43.21778, -22.97349), (-43.21741, -22.97179), (-43.21733, -22.96971), (-43.2174, -22.96784), (-43.21719, -22.96736), (-43.21725, -22.96735), (-43.21741, -22.96772), (-43.21737, -22.96954), (-43.21744, -22.9714), (-43.21763, -22.97252), (-43.21809, -22.9746), (-43.21877, -22.97607), (-43.21887, -22.97644), (-43.21881, -22.97668), (-43.21871, -22.9771), (-43.21809, -22.97826), (-43.21767, -22.9788), (-43.21738, -22.97893), (-43.21608, -22.97954), (-43.21469, -22.98018), (-43.21451, -22.98025), (-43.21123, -22.98035), (-43.2083, -22.98053), (-43.20743, -22.98053), (-43.20683, -22.98054), (-43.2043, -22.97978), (-43.2003, -22.97865), (-43.19966, -22.97842), (-43.19909, -22.9774), (-43.19929, -22.97685), (-43.19981, -22.97609), (-43.20075, -22.97476), (-43.20126, -22.97408), (-43.20222, -22.97327), (-43.20325, -22.97304), (-43.20368, -22.97302), (-43.20401, -22.97304), (-43.20485, -22.97347), (-43.20604, -22.97406), (-43.2072, -22.97455), (-43.2075, -22.97449), (-43.20774, -22.97415), (-43.20823, -22.97292), (-43.2084, -22.97226), (-43.20827, -22.97192), (-43.20662, -22.97026), (-43.20543, -22.96901), (-43.20443, -22.96799), (-43.20376, -22.9672), (-43.20359, -22.96646), (-43.20386, -22.96442), (-43.20401, -22.96366), (-43.20462, -22.96303), (-43.20488, -22.96285), (-43.20512, -22.96278), (-43.20545, -22.96278), (-43.20592, -22.96284), (-43.20711, -22.96296), (-43.20771, -22.96301), (-43.20868, -22.96314), (-43.20893, -22.96315), (-43.20913, -22.96315), (-43.20957, -22.96304), (-43.21012, -22.96284), (-43.21102, -22.96282), (-43.21157, -22.96294), (-43.2132, -22.96354), (-43.21399, -22.96398), (-43.21469, -22.96433), (-43.21532, -22.96477), (-43.21589, -22.9652), (-43.21632, -22.96581), (-43.21719, -22.96721))),
  Route(id='509480d11a345f37f24e90c8', points=((-43.21716, -22.96728), (-43.21715, -22.9673), (-43.21699, -22.96698), (-43.21593, -22.9653), (-43.21465, -22.96428), (-43.21338, -22.96364), (-43.2114, -22.96284), (-43.211, -22.96276), (-43.20964, -22.96294), (-43.20899, -22.96311), (-43.2056, -22.96274), (-43.20495, -22.96274), (-43.20464, -22.96294), (-43.20399, -22.9636), (-43.20387, -22.96401), (-43.20353, -22.96644), (-43.20357, -22.96686), (-43.20372, -22.96721), (-43.2037, -22.96721), (-43.20398, -22.96761), (-43.20487, -22.96853), (-43.20515, -22.96879), (-43.20644, -22.97013), (-43.20647, -22.9701), (-43.20644, -22.97013), (-43.20782, -22.97148), (-43.20826, -22.97198), (-43.20831, -22.97208), (-43.20828, -22.97209), (-43.20831, -22.97208), (-43.20835, -22.97215), (-43.20837, -22.97229), (-43.20835, -22.97247), (-43.20815, -22.97304), (-43.20776, -22.974), (-43.20773, -22.97399), (-43.20776, -22.974), (-43.20765, -22.97428), (-43.20758, -22.97437), (-43.20743, -22.97449), (-43.2073, -22.97452), (-43.20715, -22.97453), (-43.20708, -22.97453), (-43.20692, -22.97447), (-43.20529, -22.97362), (-43.20488, -22.97343), (-43.20487, -22.97344), (-43.20488, -22.97343), (-43.20407, -22.97302), (-43.20375, -22.97297), (-43.20349, -22.97297), (-43.20291, -22.97305), (-43.2026, -22.97314), (-43.2021, -22.97338), (-43.20176, -22.97363), (-43.20088, -22.97454), (-43.20067, -22.97481), (-43.2007, -22.97483), (-43.20067, -22.97481), (-43.20036, -22.97521), (-43.19993, -22.97584), (-43.19995, -22.97585), (-43.19993, -22.97584), (-43.19951, -22.97643), (-43.19922, -22.97692), (-43.19912, -22.97736), (-43.19921, -22.97778), (-43.19939, -22.97813), (-43.19963, -22.97845), (-43.20019, -22.97867), (-43.20069, -22.97878), (-43.20507, -22.98004), (-43.20508, -22.97998), (-43.20507, -22.98004), (-43.20655, -22.98046), (-43.20711, -22.98055), (-43.20773, -22.98059), (-43.20816, -22.98058), (-43.20898, -22.98051), (-43.21273, -22.98033), (-43.21272, -22.98026), (-43.21273, -22.98033), (-43.21452, -22.9803), (-43.2152, -22.98005), (-43.2165, -22.9794), (-43.2177, -22.97886), (-43.21803, -22.97843), (-43.21883, -22.97695), (-43.21884, -22.97679), (-43.21871, -22.97678), (-43.21882, -22.97641), (-43.21777, -22.97348), (-43.21741, -22.97178), (-43.21732, -22.96971), (-43.2174, -22.96783), (-43.21718, -22.96736), (-43.21725, -22.96735), (-43.21741, -22.96771), (-43.21737, -22.96953), (-43.21743, -22.9714), (-43.21762, -22.97251), (-43.21809, -22.9746), (-43.21876, -22.97606), (-43.21887, -22.97644), (-43.21871, -22.9771), (-43.21809, -22.97825), (-43.21767, -22.9788), (-43.21468, -22.98018), (-43.2145, -22.98025), (-43.21123, -22.98035), (-43.20829, -22.98053), (-43.20682, -22.98054), (-43.2043, -22.97977), (-43.2003, -22.97865), (-43.19965, -22.97841), (-43.19908, -22.97739), (-43.19929, -22.97684), (-43.1998, -22.97608), (-43.20125, -22.97408), (-43.20222, -22.97327), (-43.20325, -22.97304), (-43.20368, -22.97302), (-43.20401, -22.97304), (-43.20604, -22.97406), (-43.2072, -22.97455), (-43.2075, -22.97448), (-43.20773, -22.97414), (-43.20823, -22.97292), (-43.2084, -22.97226), (-43.20827, -22.97191), (-43.20662, -22.97025), (-43.20443, -22.96798), (-43.20375, -22.96719), (-43.20358, -22.96646), (-43.20386, -22.96441), (-43.20401, -22.96365), (-43.20461, -22.96302), (-43.20488, -22.96284), (-43.20511, -22.96277), (-43.20545, -22.96277), (-43.20592, -22.96283), (-43.2077, -22.963), (-43.20868, -22.96313), (-43.20892, -22.96314), (-43.20913, -22.96314), (-43.20957, -22.96303), (-43.21011, -22.96283), (-43.21102, -22.96281), (-43.21156, -22.96293), (-43.21319, -22.96354), (-43.21399, -22.96398), (-43.21468, -22.96433), (-43.21532, -22.96476), (-43.21589, -22.96519), (-43.21632, -22.96581), (-43.21718, -22.9672))))]
The result is a list with lists of similar routes. We got to group those into buckets of similar routes and then display that.

In [139]:
from collections import defaultdict
from uuid import uuid1

def group_routes(similar_routes):
    group = defaultdict(set)
    for r1, r2 in similar_routes:
        found = False
        for k, v in group.iteritems():
            if (r1 in v) or (r2 in v):
                #add to existing bucket
                group[k].add(r1)
                group[k].add(r2)
                found = True
                break
        if not found:
            #create a new bucket
            bucket_id = str(uuid1())
            group[bucket_id].add(r1)
            group[bucket_id].add(r2)
    return group

route_groups = group_routes(flatten(similar_routes))

print "We've got {} routes groups".format(len(route_groups))

# lets take a look at the first 10
for group in route_groups.values()[:10]:
    plot([route.points for route in group])


We've got 99 routes groups
So that's it, all similar routes grouped in different buckets. If you take a look at the plots, some clusters dont seem to be nothing but one route. That's not really what's happening. The thing is, on the system this data came from, the users are capable of indicating that they ran a 'template' route, instead of using a GPS to track their run. That means that "single" route is just the same route, over and over on top of each other. This still works for my needs.